medical code
KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings
Elhussein, Ahmed, Meddeb, Paul, Newbury, Abigail, Mirone, Jeanne, Stoll, Martin, Gursoy, Gamze
Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade-off: knowledge graph-based approaches capture formal relationships but miss real-world patterns, while data-driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge-preserving and Empirically refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Importantly, KEEP produces final embeddings without task-specific axillary or end-to-end training enabling KEEP to support multiple downstream applications and model architectures. Evaluations on structured EHR from UK Biobank and MIMIC-IV demonstrate that KEEP outperforms both traditional and Language Model-based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP's minimal computational requirements make it particularly suitable for resource-constrained environments. Data and Code Availability This research has been conducted using data from UK Biobank (Sud-low et al., 2015) and MIMIC-IV Johnson et al. (2021). Researchers can request access via https:// www.ukbiobank.ac.uk/ and https://physionet.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.77)
- (2 more...)
Structure-aware Hypergraph Transformer for Diagnosis Prediction in Electronic Health Records
Electronic Health Records (EHR) systematically organize patient health data through standardized medical codes, serving as a comprehensive and invaluable source for predictive modeling. Graph neural networks (GNNs) have demonstrated effectiveness in modeling interactions between medical codes within EHR. However, existing GNN-based methods are inadequate due to: a) their reliance on pairwise relations fails to capture the inherent higher-order dependencies in clinical data, and b) the localized message-passing scheme limits representation power. To address these issues, this paper proposes a novel Structure-aware HyperGraph Transformer (SHGT) framework following three-fold ideas: a) employing a hypergraph structural encoder to capture higher-order interactions among medical codes, b) integrating the Transformer architecture to reason over the entire hypergraph, and c) designing a tailored loss function incorporating hypergraph reconstruction to preserve the hypergraph's original structure. Experiments on real-world EHR datasets demonstrate that the proposed SHGT outperforms existing state-of-the-art models on diagnosis prediction.
- Asia > Middle East > Israel (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- Research Report > Experimental Study (0.66)
- Research Report > Promising Solution (0.48)
ProtoEHR: Hierarchical Prototype Learning for EHR-based Healthcare Predictions
Cai, Zi, Liu, Yu, Luo, Zhiyao, Zhu, Tingting
Digital healthcare systems have enabled the collection of mass healthcare data in electronic healthcare records (EHRs), allowing artificial intelligence solutions for various healthcare prediction tasks. However, existing studies often focus on isolated components of EHR data, limiting their predictive performance and interpretability. To address this gap, we propose ProtoEHR, an interpretable hierarchical prototype learning framework that fully exploits the rich, multi-level structure of EHR data to enhance healthcare predictions. More specifically, ProtoEHR models relationships within and across three hierarchical levels of EHRs: medical codes, hospital visits, and patients. We first leverage large language models to extract semantic relationships among medical codes and construct a medical knowledge graph as the knowledge source. Building on this, we design a hierarchical representation learning framework that captures contextualized representations across three levels, while incorporating prototype information within each level to capture intrinsic similarities and improve generalization. To perform a comprehensive assessment, we evaluate ProtoEHR in two public datasets on five clinically significant tasks, including prediction of mortality, prediction of readmission, prediction of length of stay, drug recommendation, and prediction of phenotype. The results demonstrate the ability of ProtoEHR to make accurate, robust, and interpretable predictions compared to baselines in the literature. Furthermore, ProtoEHR offers interpretable insights on code, visit, and patient levels to aid in healthcare prediction.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (2 more...)
- Research Report > New Finding (0.34)
- Research Report > Experimental Study (0.34)
MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models
Kim, Junmo, Lee, Namkyeong, Kim, Jiwon, Kim, Kwangsoo
Electronic health record (EHR) foundation models have been an area ripe for exploration with their improved performance in various medical tasks. Despite the rapid advances, there exists a fundamental limitation: Processing unseen medical codes out of vocabulary. This problem limits the generalizability of EHR foundation models and the integration of models trained with different vocabularies. To alleviate this problem, we propose a set of novel medical concept representations (MedRep) for EHR foundation models based on the observational medical outcome partnership (OMOP) common data model (CDM). For concept representation learning, we enrich the information of each concept with a minimal definition through large language model (LLM) prompts and complement the text-based representations through the graph ontology of OMOP vocabulary. Our approach outperforms the vanilla EHR foundation model and the model with a previously introduced medical code tokenizer in diverse prediction tasks. We also demonstrate the generalizability of MedRep through external validation.
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Nebraska (0.04)
- North America > United States > Maryland > Montgomery County > Rockville (0.04)
- Asia > China (0.04)
HiRef: Leveraging Hierarchical Ontology and Network Refinement for Robust Medication Recommendation
Chok, Yan Ting, Park, Soyon, Baek, Seungheun, Kim, Hajung, Lee, Junhyun, Kang, Jaewoo
Medication recommendation is a crucial task for assisting physicians in making timely decisions from longitudinal patient medical records. However, real-world EHR data present significant challenges due to the presence of rarely observed medical entities and incomplete records that may not fully capture the clinical ground truth. While data-driven models trained on longitudinal Electronic Health Records often achieve strong empirical performance, they struggle to generalize under missing or novel conditions, largely due to their reliance on observed co-occurrence patterns. To address these issues, we propose Hierarchical Ontology and Network Refinement for Robust Medication Recommendation (HiRef), a unified framework that combines two complementary structures: (i) the hierarchical semantics encoded in curated medical ontologies, and (ii) refined co-occurrence patterns derived from real-world EHRs. We embed ontology entities in hyperbolic space, which naturally captures tree-like relationships and enables knowledge transfer through shared ancestors, thereby improving generalizability to unseen codes. To further improve robustness, we introduce a prior-guided sparse regularization scheme that refines the EHR co-occurrence graph by suppressing spurious edges while preserving clinically meaningful associations. Our model achieves strong performance on EHR benchmarks (MIMIC-III and MIMIC-IV) and maintains high accuracy under simulated unseen-code settings. Extensive experiments with comprehensive ablation studies demonstrate HiRef's resilience to unseen medical codes, supported by in-depth analyses of the learned sparsified graph structure and medical code embeddings.
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States (0.04)
- Asia > Middle East > Syria > Aleppo Governorate > Aleppo (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.77)
The Anatomy of Evidence: An Investigation Into Explainable ICD Coding
Beckh, Katharina, Studeny, Elisa, Gannamaneni, Sujan Sai, Antweiler, Dario, Rüping, Stefan
Automatic medical coding has the potential to ease documentation and billing processes. For this task, transparency plays an important role for medical coders and regulatory bodies, which can be achieved using explainability methods. However, the evaluation of these approaches has been mostly limited to short text and binary settings due to a scarcity of annotated data. Recent efforts by Cheng et al. (2023) have introduced the MDACE dataset, which provides a valuable resource containing code evidence in clinical records. In this work, we conduct an in-depth analysis of the MDACE dataset and perform plausibility evaluation of current explainable medical coding systems from an applied perspective. With this, we contribute to a deeper understanding of automatic medical coding and evidence extraction. Our findings reveal that ground truth evidence aligns with code descriptions to a certain degree. An investigation into state-of-the-art approaches shows a high overlap with ground truth evidence. We propose match measures and highlight success and failure cases. Based on our findings, we provide recommendations for developing and evaluating explainable medical coding systems.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (6 more...)
DeepJ: Graph Convolutional Transformers with Differentiable Pooling for Patient Trajectory Modeling
Li, Deyi, Yao, Zijun, Liang, Muxuan, Liu, Mei
In recent years, graph learning has gained significant interest for modeling complex interactions among medical events in structured Electronic Health Record (EHR) data. However, existing graph-based approaches often work in a static manner, either restricting interactions within individual encounters or collapsing all historical encounters into a single snapshot. As a result, when it is necessary to identify meaningful groups of medical events spanning longitudinal encounters, existing methods are inadequate in modeling interactions cross encounters while accounting for temporal dependencies. To address this limitation, we introduce Deep Patient Journey (DeepJ), a novel graph convolutional transformer model with differentiable graph pooling to effectively capture intra-encounter and inter-encounter medical event interactions. DeepJ can identify groups of temporally and functionally related medical events, offering valuable insights into key event clusters pertinent to patient outcome prediction. DeepJ significantly outperformed five state-of-the-art baseline models while enhancing interpretability, demonstrating its potential for improved patient risk stratification.
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > Kansas > Douglas County > Lawrence (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Structured Semantics from Unstructured Notes: Language Model Approaches to EHR-Based Decision Support
Ran, Wu Hao, Xi, Xi, Li, Furong, Lu, Jingyi, Jiang, Jian, Huang, Hui, Zhang, Yuzhuan, Li, Shi
The advent of large language models (LLMs) has opened new avenues for analyzing complex, unstructured data, particularly within the medical domain. Electronic Health Records (EHRs) contain a wealth of information in various formats, including free text clinical notes, structured lab results, and diagnostic codes. This paper explores the application of advanced language models to leverage these diverse data sources for improved clinical decision support. We will discuss how text-based features, often overlooked in traditional high dimensional EHR analysis, can provide semantically rich representations and aid in harmonizing data across different institutions. Furthermore, we delve into the challenges and opportunities of incorporating medical codes and ensuring the generalizability and fairness of AI models in healthcare.
- North America > Canada (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records
Nahian, Md Sultan Al, Delcher, Chris, Harris, Daniel, Akpunonu, Peter, Kavuluru, Ramakanth
-- The ability to predict drug overdose risk from a patient's medical records is crucial for timely intervention and prevention. Traditional machine learning models have shown promise in analyzing longitudinal medical records for this task. However, recent advancements in large language models (LLMs) offer an opportunity to enhance prediction performance by leveraging their ability to process long textual data and their inherent prior knowledge across diverse tasks. In this study, we assess the effectiveness of Open AI's GPT -4o LLM in predicting drug overdose events using patients' longitudinal insurance claims records. We evaluate its performance in both fine-tuned and zero-shot settings, comparing them to strong traditional machine learning methods as baselines. Our results show that LLMs not only outperform traditional models in certain settings but can also predict overdose risk in a zero-shot setting without task-specific training. Drug overdose (OD) is a major public health crisis in the United States, leading to a substantial number of emergency medical interventions and fatalities each year. According to the Centers for Disease Control and Prevention (CDC), drug overdoses claimed approximately 107,941 [1] lives in the U.S. in 2022, highlighting the urgent need for effective prevention and intervention strategies. Besides fatal outcomes and lost quality of life for patients, the misuse of prescription medications, illicit drugs, and polysubstance abuse has placed an immense burden on healthcare systems, emergency responders, and policymakers. Identifying individuals at risk early can facilitate timely interventions, such as targeted clinical assessments, behavioral support, and prescription monitoring, thereby reducing the likelihood of fatal outcomes. Md Sultan Al Nahian is with the Institute for Biomedical Informatics, University of Kentucky, Lexington, KY 40536 USA. Chris Delcher and Daniel Harris are with the Department of Pharmacy Practice and Science, University of Kentucky, Lexington, KY 40536 USA. Peter Akpunonu is with the Department of Emergency Medicine, University of Kentucky, Lexington, KY 40536 USA.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer
Zhou, Guanglin, Barbieri, Sebastiano
Generating realistic synthetic electronic health records (EHRs) holds tremendous promise for accelerating healthcare research, facilitating AI model development and enhancing patient privacy. However, existing generative methods typically treat EHRs as flat sequences of discrete medical codes. This approach overlooks two critical aspects: the inherent hierarchical organization of clinical coding systems and the rich semantic context provided by code descriptions. Consequently, synthetic patient sequences often lack high clinical fidelity and have limited utility in downstream clinical tasks. In this paper, we propose the Hierarchy- and Semantics-Guided Transformer (HiSGT), a novel framework that leverages both hierarchical and semantic information for the generative process. HiSGT constructs a hierarchical graph to encode parent-child and sibling relationships among clinical codes and employs a graph neural network to derive hierarchy-aware embeddings. These are then fused with semantic embeddings extracted from a pre-trained clinical language model (e.g., ClinicalBERT), enabling the Transformer-based generator to more accurately model the nuanced clinical patterns inherent in real EHRs. Extensive experiments on the MIMIC-III and MIMIC-IV datasets demonstrate that HiSGT significantly improves the statistical alignment of synthetic data with real patient records, as well as supports robust downstream applications such as chronic disease classification. By addressing the limitations of conventional raw code-based generative models, HiSGT represents a significant step toward clinically high-fidelity synthetic data generation and a general paradigm suitable for interpretable medical code representation, offering valuable applications in data augmentation and privacy-preserving healthcare analytics.
- Oceania > Australia > Queensland (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)